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1. INTRODUCTION 

Video foreground extraction aims to classify pixels of video frames to pixels belongs to the 
foreground and background pixels, more generally the most existed approaches are based on optical flow, 
background subtraction, images difference, graph based approach, clustering algorithm, and deep learning. 
It’s can be used in semantic scene understanding, traffic surveillance, recognition, robotic, video indexing, 
and many other reel-time application. A lot of research has been focused on motion detection. They can be 
classified into supervised [1-3] and unsupervised methods [4-11]. In the first category, the segmentation 
requires some initial seeds to be selected in the first frame to perform segmentation. Therefore Fan and al. [1] 
use a mask transfer and interpolation method, from foreground mask in source frame he estimates the 
foreground at an other frame. Wang and al. [3] propose an algorithm to segment video based on a level set 
framework and an appearance model, this algorithm requires only a single finger touch the object in the first 
frame. In [17] Rother and al. use iterated graph cut to extract the foreground. 

The second category doesn’t require any user involvement, over the past decade a lot of works are 
focused on analyzing information like coherence, motion, and appearance in space-time blob of video [5], 
[14], [8]. Wu and al. [15] propose a method that uses least squares tracking framework and learned 
appearance models to segment and track motion. Therefore Khoreva and al. [16] apply a method to learn the 
graph by exploiting edge topology and weights of the graph. Faktor and al. [4] use re-occurring regions by 
constructing a graph of the voting scheme of re-occurring regions across the video sequence. Vertens and al. 
[11] are used the convolutional neural network to predict the object label and motion status of each pixel in 
an image. This category takes on its importance in real time application requiring an instantaneous 
understanding of the scene. Motion segmentation still encounters many challenges like occlusion, camera 
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motion, noise, and many complex situations, specifically the automatic motion segmentation that our 
contribution will focused. 

The first steep in our contribution consist to extract some seeds representing pixels belongs to the 
foreground and background pixels. the extraction is performed by detecting some good feature to track [12] 
in the preview frame in RGB space color and make difference between those sparse points and adjacent 
points in the current frame, to make the result more accurate we use HSV space color to compute the 
difference. Secondly, we formulate our issues as graph based problem then an energy function is defined to 
evaluate labeling pixels, by incorporate spatial and temporal information in video sequences. Finally, the 
random walk algorithm [13] is applied to minimize the energy function and get the final segmentation. The 
figure (1) explain an overview of our approach. 
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Figure 1. Overview of our approach 


2. THE PROPOSED METHOD 

Our approach aims to group pixels of frames video into pixels in motion and stationary pixels, The 
operation of affecting a label to every pixel in the frame we get the motion segmentation. In this paper, we 
propose a method based on the good features to track and difference to detect initial seeds as illustrated in 
figure (2), and random walks algorithm to minimize the formulated energy function. 





Figure 2. The left frame illustrate the good features to track, and the right frame represent the initial seeds(the 
blue are in motion and the red are stationary 
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2.1. Initial Seeds 

The extraction of initial seeds still a major challenge, due that the most existing methods are based 
on the probabilistic model and they are many difficult situations that make the detection of initial seeds 
inaccurate. In our approach, to detect the initial seeds we are used [12] to extract a sparse of good feature to 
track in RGB color space in the current frame and performing a difference between those sparse of pixels and 
his adjacent in preview frame,to increase the result accuracy we have computed the difference in HSV color 
space. By performing a threshold on this difference we get the classification of those sparse pixels, if it’s 
bigger than a threshold «1, the pixel is labeled as in motion, else if it’s smaller than then a threshold a2, the 
pixel is labeled stationary. Those initial seeds will be incorporated into random walks algorithm to perform 
our final motion segmentation. 


2.2. Energy Function 

To achieve our motion segmentation, an energy function is formulated incorporating spatial and 
temporal information, similar to [19] we get the probability x that a pixel is in motion by the minimizing the 
energy function as follow: 


Q[x] = Qs[x] + AQr[x] (1) 
Where A is a free parameter that controls the weighting between the two energies, and Qg represent 


the spatial smoothness, this energy function minimizes the edges weights between neighboring pixels, it’s 
defined as follow: 


Qs [x] = x™Lx = deyeE Wij (x - xj)? (2) 


The Q- determine the temporal smoothness, this function minimizes the incoherence with predicted 
confidence m; and s;, it’s defined as follow: 


Qr[x] = Xy, m1 — x)? + Ly, six? (3) 
The energy function (9) can be formulated in matricidal form as follow: 
Q[x] = x™Lx + A((1 — x)™M(1 — x) + x7Sx) (4) 


Where m; represent the probability that the pixel v; belongs to the foreground, and s; the probability 
that the pixel v,; belongs to the background. The optimization of energy leads to resolve the equation 
as follow: 


The tow matrices M and S are positive and diagonal. To determine the confidence that the pixel v; 
belongs to the foreground, we evaluate the gradient to every frame pixels in two directions and we compute 
the difference between pixels in frame t end his adjacent pixels in frame t + 1. Then we can formulate the 
confidence m, € {0,1} as follow : 


_ (1 if|V(gh — V(gi**)| > 1, 
mak (5 otherwise, (6) 


Where V(gt) represent the gradient at pixel v; in frame t. Like (6) the confidence s; € {0,1} that 
pixel v; belongs to background is defined as follow: 


_ (1 if |V(gi) — V(gi**)| < s2, 
si = : (7) 
0 otherwise, 


3. RESEARCH METHOD 

Random walks image segmentation have been proposed by Grady [13]. Then image is represented 
as a graph G(V, E) where each node vj € V represent a pixel of image and each edge e; represent connection 
between pixel v; and neighbor pixel v;. Let n = |V| and m = |E| where |.| denotes the cardinality, edge 
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weight wj; evaluate the similarity between connected pixels. The basic idea of random walks segmentation 
consist to starting a random walk from each pixel in the image and compute the probabilities of witch seeds 
they first arrive at. The edge weight can represent the difference in image intensity, texture information, color 
or other features. In [13] Grady applies a Gaussian weighting function to construct the graph. 


W, = exp(—B(gi — g))) (8) 


gjandg;: image intensity at pixel i and j. B: free parameter. To deal with other features information 
we can replace (g; — 8)" with |g; — g;| |?. The segmentation is achieved by minimizing this 
energy function: 


‘ek Wi(X; — Xj)? (9) 


J 


Q(x] = SX7LX= 52, 


The solution of random walk problem consists to find a harmonic function that satisfies the Laplace equation 
with respect to the boundary conditions. 


Vx =0 (10) 


The combinatorial Laplacian matrix L, defined as follow: 


d; ifi = j, 
Li =| —wj if vjand vjare adjacent nodes, (11) 
0 otherwise, 


Where Lj, is indexed by the vertices v, and vj, and d, = Diet wj;. The vertices are grouped into 
seeded nodes V,, and unseeded nodes V,,, without loss of generality nodes in L and X are ordered such that 
seed nodes are first and unseeded nodes are second. Decomposing equation (9) lead to: 


1 Lm B [Xm 
QUXu) = 5 (Xn LaXn + 2x7B7X py + XTLyXu) (13) 


Xy and Xy correspond to potentials of the seeded and unseeded nodes respectively. Differentiating 
Q[X,,] with respect to X,, lead to compute X,, by solving this equation : 


L,X = —B™ (14) 


The final segmentation is obtained by assigning to each node v, the label s corresponding to 
Max(X;). Where the probabilities at any node v, will sum to unity : 


Xs Xf = (15) 


The random walk segmentation has nice properties like robustness to the weak boundary, noise, and 
avoidance of trivial solutions in comparison with graph cut and other segmentation algorithms. All those 
advantages consolidate the chose of random walks algorithm in our approach. 


4. RESULTS AND DISCUSSION 

We have implemented our automatic foreground extraction approach in C++ programming language 
using openCV, boost , and eigen libraries, on a PC with Intel(R) Core(TM) i5 CPU, 2.40 GHZ, 4 Go in RAM 
and Windows 7 operating system. The method was tested on several videos, and the figure (3) illustrate the 
obtained result on walk video sequence with A = 0.1, and B = 0.01, a1 = 60, a2 =1, s1 = 20, s2 =3 
graph edges is construct by V4 neighbors pixels and frame size is 160 X 120. the complexity algorithmic of 
our approach requires O(n) operations, where n is the number of the frame pixels. to efficiently resolve the 
linear system of equations (13) matrices as represented as a sparse matrix, so instead of storing the entire 
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matrix, a vector of weights may be stored to decrease the computational time. finally, GPU-based 
implementation is encouraged for real-time processing like behavior analysis and recognition, motion 
analysis, event detection. 





Figure 3. Experimental results using our approach on the walk video sequences, the first row illustrate the 
good feature to track, the second row represent the initial seeds(blue are in motion and red are stationary) and 
the third row display final segmentation 


5. CONCLUSION 

In our motion detection approach, we have presented a spatiotemporal video segmentation, by 
formulating the desired automatic foreground extraction as a graph based problem. In addition to spatial 
coherence used in image segmentation, we have applied the temporal information by introducing a likelihood 
term in the energy function, this term penalizes the similarity between adjacent pixels in the current frame 
and next frame. Like much other motion segmentation, the random walks algorithm was applied to minimize 
the defined energy function, and resolving our labeling problem to get the final pixels classification. The 
interest of our method in addition to his performance that not required any human interaction, they can be 
used in real time application. Our future work will improve the efficiency of our approach further and make 
the result more accurate. 
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